Search CORE

21 research outputs found

XEngine : Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments

Author: Membarth Richard
Schuler Manuela
Slusallek Philipp
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in backpropagation, some forward tensors can be discarded and recomputed later from saved tensors, so-called checkpoints. This allows, in particular, for resource-constrained heterogeneous environments to make use of all available compute devices. Unfortunately, the definition of these checkpoints is a non-trivial problem and poses a challenge to the programmer—improper or excessive recomputations negate the benefit of checkpointing. In this article, we present XEngine, an approach that schedules network operators to heterogeneous devices in low memory environments by determining checkpoints and recomputations of tensors. Our approach selects suitable resources per timestep and operator and optimizes the end-to-end time for neural networks taking the memory limitation of each device into account. For this, we formulate a mixed-integer quadratic program (MIQP) to schedule operators of deep learning networks on heterogeneous systems. We compare our MIQP solver XEngine against Checkmate [12], a mixed-integer linear programming (MILP) approach that solves recomputation on a single device. Our solver finds solutions that are up to 22.5% faster than the fastest Checkmate schedule in which the network is computed exclusively on a single device. We also find valid schedules for networks making use of both central processing units and graphics processing units if memory limitations do not allow scheduling exclusively to the graphics processing unit

Universaar

Acronym

Parallel Multi-Hypothesis Algorithm for Criticality Estimation in Traffic and Collision Avoidance

Author: Botsch Michael
Dirndorfer Tobias
Gaull Andreas
Kammenhuber Alexander
Lauer Christoph
Membarth Richard
Morales Eduardo Sánchez
Slusallek Philipp
Publication venue
Publication date: 13/05/2020
Field of study

Due to the current developments towards autonomous driving and vehicle active safety, there is an increasing necessity for algorithms that are able to perform complex criticality predictions in real-time. Being able to process multi-object traffic scenarios aids the implementation of a variety of automotive applications such as driver assistance systems for collision prevention and mitigation as well as fall-back systems for autonomous vehicles. We present a fully model-based algorithm with a parallelizable architecture. The proposed algorithm can evaluate the criticality of complex, multi-modal (vehicles and pedestrians) traffic scenarios by simulating millions of trajectory combinations and detecting collisions between objects. The algorithm is able to estimate upcoming criticality at very early stages, demonstrating its potential for vehicle safety-systems and autonomous driving applications. An implementation on an embedded system in a test vehicle proves in a prototypical manner the compatibility of the algorithm with the hardware possibilities of modern cars. For a complex traffic scenario with 11 dynamic objects, more than 86 million pose combinations are evaluated in 21 ms on the GPU of a Drive PX~2

arXiv.org e-Print Archive

Crossref

Scipedia

Acceleration of Multiresolution Imaging Algorithms: A Comparative Study

Author: Frank Hannig
Hritam Dutta
Jürgen Teich
Richard Membarth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/12/2009
Field of study

Abstract—In this paper we consider a multiresolution filter and its realization on the Cell BE and GPUs. We not only present common and specific optimization strategies undertaken for obtaining maximum performance on these architectures, but also how to obtain a speedup of 6.57x and 33.24x compared to an optimized OpenMP baseline implementation. Furthermore, we also undertake automated configuration space exploration of different partitioning possibilities for selection of best tiling parameters. I

CiteSeerX

Crossref

FSM-controlled architectures for linear invasion

Author: Abdulazim Amouri
Farhadur Arifin
Frank Hannig
Jürgen Teich
Richard Membarth
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract—Invasive computing is a novel concept in multiprocessor architecture and programming. Invasion will become an important step towards self-organizing behavior which will be needed in the next generation of massively parallel MPSoCs with unrivaled performance and resource efficiency numbers as one of the main challenges for MPSoC apart from their programming. In this paper we introduce and model a finite state machine for controlling the invasive process in different architectural granularities. The applicability of our FSM is tested in case studies for a reconfigurable MPSoC platform and a fine-grained platform. The results show substantial flexibility gains with only marginal additional hardware cost

CiteSeerX

Crossref

Shallow embedding of DSLs via online partial evaluation

Author: Boesche Klaas
Hack Sebastian
Leißa Roland
Membarth Richard
Slusallek Philipp
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

MAnnheim DOCument Server

Target-specific refinement of multigrid codes

Author: Hack Sebastian
Köster Marcel
Leißa Roland
Membarth Richard
Slusallek Philipp
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

MAnnheim DOCument Server

Rodent: Generating renderers without writing a generator

Author: Hack Sebastian
Leißa Roland
Membarth Richard
Pérard-Gayot Arsène
Slusallek Philipp
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

MAnnheim DOCument Server